Dataset Overview

The World Development Indicators (WDI) dataset, sourced from the World Bank, provides a comprehensive view of development metrics across countries and regions from 2013 to 2022. This dataset is ideal for exploring relationships among socio-economic, environmental, and political indicators, as well as observing trends and disparities across regions. More information about the WDI dataset, including variables, can be found here: https://cmustatistics.github.io/data-repository/politics/world-bank.html


Key Features:

  • Timeframe: Data covers ten years (2013–2022).
  • Scope: Includes 266 countries and regions, including aggregates like “Sub-Saharan Africa.”
  • Variables: Features 40 indicators capturing diverse aspects of development.
  • Granularity: Each row represents a single country, territory, or region in a given year.
  • Limitations: Not all variables are available for all countries in all years, and more recent data is missing more often than older data.

Variables Used in Analysis

To address our research questions, we selected the following [insert number] variables, representing key aspects of national prosperity. For each variable, its form and value range are described.


1. GDP per Capita (GDPperCapita)

  • Definition: The gross domestic product (GDP) divided by the total population of a country or region.
  • Form: Continuous numeric variable, measured in USD.
  • Range: Varies widely, e.g., from hundreds in low-income countries to over $100,000 in high-income nations.
  • Relevance: A critical measure of economic prosperity, often used to compare development levels across regions.

2. Internet Usage (Internet)

  • Definition: The percentage of the population with Internet access.
  • Form: Continuous numeric variable, measured as a percentage.
  • Range: 0% to 100%, where 0% indicates no Internet access and 100% indicates universal Internet access within the population.
  • Relevance: Reflects technological development and access to digital resources.

3. Birth Rate (Birth)

  • Definition: The crude birth rate, expressed as the number of live births per 1,000 people per year.
  • Form: Continuous numeric variable, typically ranging between 5 (low birth rates in developed countries) to 50 (high birth rates in developing regions).
  • Relevance: Provides insights into population growth trends and socio-economic factors such as healthcare access.

4. Literacy Rate (Literacy)

  • Definition: The percentage of adults (15 years and older) who can read and write.
  • Form: Continuous numeric variable, measured as a percentage.
  • Range: 0% to 100%, where higher values indicate better educational outcomes.
  • Relevance: A strong indicator of human capital, with implications for economic productivity and quality of life.

5. Access to Electricity (Electricity)

  • Definition: The percentage of the population with access to electricity.
  • Form: Continuous numeric variable, measured as a percentage.
  • Range: 0% to 100%, where 0% indicates no access and 100% indicates universal access within the population.
  • Relevance: An essential infrastructure metric, reflecting living standards and economic development.

6. Political Stability (PoliticalStability)

  • Definition: A z-score measuring the likelihood of political instability or violence within a country.
  • Form: Continuous numeric variable, normalized as a z-score.
  • Range: Typically ranges between -2.5 (very unstable) to 2.5 (highly stable).
  • Relevance: Captures governance quality and security, crucial for understanding development risks.

Why These Variables?

These indicators were selected to represent a balanced view of economic, social, and political development:
- Economic: GDPperCapita and Electricity
- Technological: Internet
- Demographic: Birth
- Social: Literacy
- Political: PoliticalStability

Together, they provide a robust framework for examining regional clustering and disparities in national prosperity.


Research Question #[number]: How do geographic regions differ by various indicators of national prosperity?


To answer the above question, we observe clustering behaviors of geographic regions on important metrics, such as GDP, Internet, Birth, Literacy, Electricity, and Political Stability rate. Since GDP is a multiplier on population, we normalize it into a new transformed variable, GDPperCapita.




The above 2d MDS plot suggests some clustering of Sub Saharan Africa, Europe & Central Asia, and Latin America & Caribbean, as well as some overlap in clusters of other regions, but clustering of all 6 regions is difficult to observe. We could create side-by-side plots for each cluster, but doing so makes gauging the distance between clusters difficult. Instead, we use plotly to create an interactive 3d MDS plot to further differentiate the clusters.








The above 3d MDS plot shows a clearer distinction for all the geographic clusters of varying spread. We observe that Sub-Saharan Africa and Middle East & North Africa are the most distinct by the chosen indicators out of all the regions. In comparison, other 4 regions show noticible overlap in clustering, especially Europe & Central Asia and Latin America & Carribean, suggesting regional similarities. These two MDS plots suggest meaningful differences and similarities across regions on these important metrics of national prosperity.